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Abstract. This paper introduces the theory and practice of formal verification of self-assembling 
systems. We interpret a well-studied abstraction of nanomolecular self assembly, the Abstract Tile 
Assembly Model (aTAM), into Computation Tree Logic (CTL), a temporal logic often used in model 

— ^ checking. We then consider the class of "rectilinear" tile assembly systems. This class includes most 

aTAM systems studied in the theoretical literature, and all (algorithmic) DNA tile self-assembling 
systems that have been realized in laboratories to date. We present a polynomial-time algorithm that, 

rv^l given a tile assembly system T as input, either provides a counterexample to T's rectilinearity or verifies 

whether T has a unique terminal assembly. Using partial order reductions, the verification search space 
for this algorithm is reduced from exponential size to 0(n 2 ), where nxnis the size of the assembly 
surface. That reduction is asymptotically the best possible. We report on experimental results obtained 

^—H by translating tile assembly simulator files into a Petri net format manipulable by the SMART model 

checking engines devised by Ciardo et al. The model checker runs in C(|T| • n 4 ) time, where |T| is the 
number of tile types in tile assembly system T, and nxnis the surface size. Atypical for a model 
checking problem — in which the practical limit usually is insufficient memory to store the state space — 

\^ the limit in this case was the amount of memory required to represent the rules of the model. (Storage 

i__J of the state space and of the reachability graph were small by comparison.) We discuss how to overcome 

(y} this obstacle by means of a front end tailored to the characteristics of self-assembly. 
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1 Introduction 

The emerging field of algorithmic nanomolecular self-assembly began in the mid-1990s, when Adle- 
man, Rothemund, Winfree and others demonstrated through both mathematical rigor and experi- 
mentation that it was possible to "program matter" by designing sets of DNA molecules that would 
spontaneously combine together into desired shapes [19]. Perhaps the biggest practical success so 
far has been the technology of "DNA origami," which is now being used in joint research between 
IBM and CalTech to build a microchip with transistors placed closer together than ever before [9]. 
Part of the reason this technology is "ahead of" other DNA self-assembly technologies is its low 
error rate, especially compared to "DNA tile" self-assembly. Understanding the behavior of DNA 
tiles is difficult, even if they bind error- free, and when one considers a tile assembly system designed 
to perform error-correction, the analysis can be much more complex. Tools of formal verification 
have been extremely useful in other areas of computer science, especially when concurrency and 
nondeterminism can produce unexpected (and undesirable) executions. To date, however, methods 
of formal verification have not been applied to self- assembling systems. Instead, self-assembly re- 
search is reminiscent of concurrent-system research in the early 1980s: the best method to verify 
a construction works is to run it in a simulator multiple times, and watch for any bad behavior. 
Of course, this provides no guarantee against rare occurrences of bad behavior, which was one 
motivation for the introduction of formal methods of verification. In this paper, we are similarly 
motivated: we present a theory for the formal verification of DNA tile self-assembly, and we report 
on initial experiments performed with a model checking tool implementing this theory. 

Self-assembly is a process in which small objects, which communicate and connect only with 
their local neighbors, form global structures. Algorithmic self-assembly studies self-assembly through 
the "algorithmic lens," and, in particular, considers the design and complexity of self- assembling 
systems. While researchers in robotics [TU] and amorphous computing [T3] are active in this area, 
in this paper, we focus on nanomolecular algorithmic self-assembly, achieved in the lab by building 
"tiles" out of DNA molecules, using techniques pioneered by Seeman |21| . A powerful mathematical 
abstraction of molecular behavior is the Abstract Tile Assembly Model (aTAM), due to Winfree |23j 
and Rothemund [17j . There is, at this point, an extensive literature on the aTAM, several variations 
of it, different complexity measures, and upper and lower bounds to assemble different shapes. 

The formalisms in the aTAM include the following: a finite set of distinct types of self-assembling 
agents, a set of local binding rules that completely determines the behavior of the agents, and an 
initial configuration of the system. A particular self-assembly "run" starts with an operator placing 
a finite seed assembly on the surface, and then allowing a "solution" containing infinitely many 
of each agent type to mix on the surface. Agents bind nondeterministically to the seed assembly, 
and to the growing configuration, consistent with the local rules. In the tile assembly models we 
consider in this paper, each agent is a four-sided tile, and the assembly surface is the first quadrant 
of the two-dimensional integer plane. 

In general, a tile assembly system (TAS) defined in the aTAM may have infinitely-long execution 
paths, and there is research into the assembly of infinite shapes such as fractals |llj . Nevertheless, 
proposed applications, and laboratory experiments, have focused on the correct construction of 
finite, bounded structures. We make use of this to design a theory of formal verification for the 
aTAM: we take as input both a tile assembly system and a bound on the assembly surface, and then 
perform model checking on the behavior of that assembly system on that surface. This guarantees 
that the set of legally reachable configurations (i.e., the transition system) is finite, so it is amenable 
to well-understood model-checking algorithms. With this bound on surface size, we interpret the 
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aTAM into CTL (Computation Tree Logic, a popular model checking formalism [lj). The question, 
"Does tile assembly system T have a terminal assembly bounded by n x n?" then reduces to a 
model checking problem for CTL. We present the reduction of the aTAM to CTL in Section [3j 

A fixed CTL model checking problem is decidable in time linear in the size of the system it 
is checking. The size of the systems relevant here are determined by the number of possible legal 
configurations (reachable state space) and the number of legal transitions between those configura- 
tions (number of edges). Since all assembly surfaces are a finite subset of Z x Z, each possible legal 
configuration is isomorphic to a connected grid graph. A transition to another configuration takes 
place when one tile is added to the current assembly. Hence, for large n, the number of potential 
transitions (edges) is sparse, due to the sparseness of edges of the underlying grid graph. The pa- 
rameter that dominates is the size of the state space, which — even for tile assembly systems that 
are very "nice" — can grow exponentially. Therefore, it is essential to reduce the size of space we 
need to search, if the model checking problem for the aTAM is to be tractable. 

We achieve this search space reduction for a significant class of tile assembly systems: the 
"rectilinear" TASes. We define the class later, but it comprises most of the TASes studied in the 
theoretical literature; and, perhaps more significantly, it also includes all DNA tile assembly systems 
produced in laboratories to date, except for systems that were intentionally random (so not formally 
verifiable), or which consisted of essentially just one tile that bound to itself again and again (a 
context in which formal verification is not relevant) . We show that rectilinear TASes are sufficiently 
well behaved that it is easy to verify online whether an input TAS is part of such a class, and, if so, 
to reduce the search space from exponential in size to C(n 2 ), because most of the tiles added are 
guaranteed to be independent of other tile additions. Since a configuration on an n x n surface can, 
of course, have n 2 tiles, added one at a time, a 0{n 2 ) search space is asymptotically best possible. 
We present this search space reduction in Section |4j 

Armed with this reduction rule, we obtain initial experimental results. We translate tile assembly 
system files from a tile assembly simulator [14j into a Petri net formalism manipulable by the 
SMART model checking engines |5j. This produces an aTAM model checker that runs in 0(|T| -n 4 ) 
time, where |7~| is the number of tile types in tile assembly system T, and n x n is the size of 
the assembly surface. Atypical for a model checking problem — in which the practical limit usually 
is insufficient memory to store the state space — the limit in this case was the amount of memory 
required to represent the rules of the model. (Storage of the state space and of the reachability 
graph were small by comparison.) We report on our experimental results in Section pi and, in 
Section [6j discuss how to reduce both memory use and running time by constructing a front end 
specialized to self-assembly. 

This paper is the first to connect formal verification with self-assembly. There is, of course, 
extensive research on model checking of asynchronous, concurrent systems [5J. There has also been 
some initial work using model checkers to study quantitative properties of biological pathways, 
such as protein creation [7]. Researchers have also started to apply formal methods to, and produce 
simulation software for, the brand-new area of DNA circuits [16] , though no verification tools yet 
exist for such circuits. Model checkers and temporal logics have been extremely useful throughout 
computer science, and we hope that the current work makes their power available to theorists and 
practitioners of algorithmic self-assembly. 



2 Background 

2.1 Tile self-assembly background 

Winfree's objective in defining the Tile Assembly Model was to provide a useful mathematical 
abstraction of DNA tiles combining in solution in a random, nondeterministic, asynchronous man- 
ner [23]. Rothemund [17J, and Rothemund and Winfree [18j . extended the original definition of the 
model. For a comprehensive introduction to tile assembly, we refer the reader to [TTj. Intuitively, 
we desire a formalism that models the placement of square tiles on the integer plane, one at a time, 
such that each new tile placed binds to the tiles already there, according to specific rules. Tiles 
have four sides (often referred to as north, south, east and west) and exactly one orientation, i.e., 
they cannot be rotated. 

A tile assembly system T is a 5-tuple (T, a, U, r, R), where T is a finite set of tile types; a is the 
seed tile or seed assembly, the "starting configuration" for assemblies of T; r : T x {N, S, E, W} — > 
U x {0, 1, 2} is an assignment of symbols ("glue names") and a "glue strength" (0, 1, or 2) to the 
north, south, east and west sides of each tile; and a symmetric relation R C £ x U that specifies 
which glues can bind with nonzero strength. In this model, there are no negative glue strengths, 
i.e., two tiles cannot repel each other. 

A configuration of T is a set of tiles, all of which are tile types from T, that have been placed 
in the plane, and the configuration is stable if the binding strength (from r and R in T) at every 
possible cut is at least 2. An assembly sequence is a sequence of single-tile additions to the frontier 
of the assembly constructed at the previous stage. Assembly sequences can be finite or infinite in 
length. The result of assembly sequence a is the union of the tile configurations obtained at every 
finite stage of a . The assemblies produced by T is the set of all stable assemblies that can be built 
by starting from the seed assembly of T and legally adding tiles. If a and /3 are configurations of T, 
we write a — > j3 if there is an assembly sequence that starts at a and produces (3. An assembly of T 
is terminal if no tiles can be stably added to it. Researchers are, of course, interested in being able 
to prove that a certain tile assembly system always achieves a certain output. In [22], Soloveichik 
and Winfree presented a strong technique for this: local determinism. An assembly sequence a 
is locally deterministic if (1) each tile added in a binds with the minimum strength required for 
binding; (2) if there is a tile of type to at location I in the result of a, and to an d the immediate 
"OUT-neighbors" of to are deleted from the result of a , then no other tile type in T can legally bind 
at I; the result of a is terminal. T is locally deterministic iff every legal tile assembly sequence of 
T is locally deterministic. Local determinism is important because Soloveichik and Winfree showed 
that if T is locally deterministic, then T has a unique terminal assembly [22]. In Section 3.2 we 
consider how to test whether a tile assembly system is locally deterministic. 

2.2 Model checking background 

We present the fundamentals of one logic often used in formal verification: CTL (Computation Tree 
Logic) [3]. (We follow the presentation of CTL found in [2Uj . which is standard.) The CTL syntax 
may be defined as follows: 

tp, ip ::= E^U^) | A((^U^) | EXip \ AXip | <p A ip \ -><p | P\ \ P 2 | ... 

where AP = {P\,P2, . . .} is a set of atomic propositions. We interpret CTL statements over a 
pointed transition system S = (Q, R, I, s), where Q is a finite set of states of the system, R C Q x Q 



is a transition relation among states, I : Q — )• 2 is a labeling of states with propositions (essentially 
determining which atomic propositions are true in that state), and s £ Q is the "point" or initial 
state of the system. 

Intuitively, the initial state s is the tile configuration at time step 0, consisting of the seed tile 
(or seed assembly) and nothing else. States of S are tile configurations on the n x n surface we 
have chosen to consider. States q and q' have the property qRq' iff there is a legal tile addition that 
transforms the assembly at q into the assembly at q' . Finally, the atomic propositions true at each 
state are precisely the assertions that a tile of a given type is present at a given surface location in 
that state, or that the location is empty. 

Formally, a run ix is a (possibly countably infinite) sequence of states a n = so, si, S2, . . . with a 
labeling l n : {sq, s\, . . .} — > 2 . The formula <p holds at position i of it according to the following 
recursive definition: 

>P£ l v (8i) (for P G AP) 

> ir, i ¥ ip 

> n, i \= <p and tv, i \= ip 

> ir, i + 1 |= ip 
>3j > i such that n, j |= if) and n,k \= cp for all i < k < j. 



Now let T be a tree rooted at an initial state, such that every path through T is a run. We can 
define the CTL existence operator by 



n, 


i \=P 


TT,i 


\=^<p 


TT,i = 


if Aip 


7T,i 


\=Xip 


n,i = 


- ip\Jip 



IT, l 



Eip <^=^> 7T , i |= ip for some tv in T such that 7r[0, . . . , i] = vr'[0, . 



From the syntax above, it is possible to define "for all" quantifiers, such as AF ("along All paths, 
Finally something is true"), and AG ("along All paths, some statement holds Globally," i.e., in 
every state). 

The states of a pointed transition system can be viewed as a partial order, with the the transitive 
closure of the transition relation generating the <-relation between states. One technique to make 
model checking problems tractable is the use of partial order reductions, rules (often based on 
concurrency or symmetry) that allow a model checker to consider a much smaller set of states while 
guaranteeing that a property holds in the reduced partial order iff it holds in the full transition 
system. We use this technique in Section [4} 

3 Interpreting the aTAM in CTL 

3.1 The model checking problem for the aTAM 

The goal of this subsection is to show that the behavior of the aTAM on a finite surface can be 
expressed in CTL. The model checking problem for the aTAM then reduces to checking whether 
a formula that expresses a particular assembly is contained in the set of possible outcomes for the 
tile assembly system we are checking — and that can be determined using known model checking 
algorithms. In particular, if we know that a finite shape S is a terminal assembly with respect to 
the binding rules of tile assembly system T, and ips is a formula that asserts the presence of S on 
an n x n surface, then (as we will see) if the CTL-interpretation of T satisfies AF(^s jn ), we have 



formally verified that S is the unique terminal assembly of T. Throughout this subsection we fix a 
natural number n, and assume the self-assembly we are simulating takes place on an n x n surface 
(a finite subset of Z x Z). 

Our strategy is that we will define the behavior of (k + l)n 2 atomic propositions, where each 
proposition corresponds to a tile type (or lack thereof) at a location on the n x n surface. These 
atomic propositions act as "agents" that "decide" whether or not to change state from "no tile is 
here" to "a tile of type t is here," consistent with the binding rules of the tile assembly system we 
wish to express in CTL. We formalize this with the following definition. 

Let T = (T,a) be a tile assembly system with k distinct tile types {ti, . . . ,£&}, and binding 
rules defined by binding function (3. Then CTL(7, n), the CTL interpretation of the behavior ofT 
on n x n surfaces, is defined as follows: 

1. Atomic propositions tf- for each < m < k and < i,j < n. (Intuitively, if ffj is true, tile type 
t m is located at (i,j), or, if m = 0, then (i,j) is empty.) 

2. For each (i,j) with < i,j < n, the following axiom (to capture that if a location is empty, 
then no tile is there): 



t 



i.i 




3. For each ££-, the following axiom (to capture that exactly one tile can be placed on any filled 
location): 



u 



V 



t 



K) 



v!/S{l,...,A;}\m 



4. For each t^, the following axiom (to capture that once a tile has been placed, it will never be 
removed) : 



Kl 



AG(t 



5. For each t^ - , include the following axioms to express the behavior of the binding function: 



i-d 



Binding function of T 



Axiom of CTL(T, n) 



P(t x ,<b,<b,®)=ty 


($At? ( , +1) )->(^A$) 


PW,t x ,M)=ty 


($ A *&,_!>) ->(-$■ A $) 


/3(0,0,t x ,0)=t,, 


(^^ +1)7 )^K*a^) 


/3(0,0,0,tx) = *v 


(^_ lb )->(^A^) 


P(t x ,ty,Q,Q)=t g 


(^• A ^ + l)A4._ 1) )^K.At| J ) 




. . . and similarly with all other possible pairings of t x 


and t y . 



6. An axiom that enforces an initial state of the transition system that simulates the seed assembly. 
Let S be the set of points occupied by a, the seed assembly of T, and S be the set of points 
(on the n x n surface) not occupied by a. Let r xy be the tile type at location (x, y) for each 
(x, y) £ S. Then we include the following formula into the language: 



/\ Pxy A t x y 

Ax,y)eS 



A 



A ■ 

(a,b)£S 



III) 



We can define a formula that is a logical interpretation of any finite shape S on an n x n surface, 
just as we defined a formula to be the interpretation of the seed assembly. 

Definition 1. Let S be a finite tile configuration in tile assembly system T ', such that S completely 
fits on a surface of size n x n. Suppose {t±, . . . ,tp.} is the set of tile types of T , and, for each 
1 < i < k, Ti = {(x,y) £ S \ U is present at (x,y)}. Let Tq be the set of all points of the surface 
that are not in any Ti. Then we define the formula ips,n °f CTL(T,n) to be 

0<i<fc and (x,y)eTi 

Note that this ignores the potential complication that a shape, viewed as a set of points, can be 
embedded on a surface in more than one location or orientation. Since we are always considering 
shapes in the context of what can be built by a tile assembly system, we can assume without loss of 
generality that the seed of the tile assembly system is rooted at the origin, and this will eliminate 
uniqueness problems that might otherwise arise from multiple possible embeddings of the shape 
into the surface under consideration. 

The main result of this section is the following theorem. We defer the proof to the Appendix. 

Theorem 1. There exists an efficient procedure to interpret a tile assembly system T and a surface 
n x n within the temporal logic CTL(T,n). In particular, the question "Does T have a unique 
terminal assembly that fits in the n x n surface?" can be reduced to the problem of model checking 
on CTL(T,n). 

3.2 Testing for local determinism 

While the general question, "Given tile assembly system T, is it locally deterministic?" is undecid- 
able, there is an open question in the literature whether there might exist an efficient test to catch 
the failure of local determinism [3], much as programmers of concurrent systems design tests for 
race conditions. We answer this question affirmatively in this section. 

Theorem 2. It adds no asymptotic complexity cost to check for local determinism of T when 
solving a model checking problem for CTL(T, n). 

Proof. Let (p — >■ ip be a transition rule for location (x,y) in CTL(7~, n). For example, for the 
transition rule 

V 3 = tij A iff ,- 1 j_\ j and ip = ~^t% A if-. Then, for each (x,y) and each transition rule tp — > tp, define 
the formula 



Ky' r :=( ^ | AR/>/\ 



tp—^if) 

— Y ' I <-" V 

<? 

where (p' ranges over antecedents of all other transition rules that might affect (x, y). In words, rjxy 
asserts that if a transition rule for a location is enabled, (1) that transition rule will eventually 
be executed, and (2) no other transition rules will ever be enabled that affect the location in 
question. As this is equivalent to the requirement of local determinism, and it's a conjunction 
(of length polynomial in the number of tiles of T) of well- formed assertions in CTL, the model 
checking problem for the conjunction of ?7's is no more complex than the model checking problem 
to determine unique terminal assembly. 
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4 Partial order reductions for rectilinear tile assembly systems 

We move now from the theory of formal verification to its practical application. An arbitrary 
tile assembly system can build exponentially many distinct legal configurations, with respect to 
the size of the assembly surface, so the aTAM model checking problem in full generality suffers 
from intractable state explosion. Therefore, we simplify the problem by considering only rectilinear 
TASes. A rectilinear TAS is one in which all growth proceeds from the south to the north, and/or 
from the west to the east, with the minimum required binding strength. (See Figurefllfor a schematic 
of such a tile assembly system.) We shall see there is a computationally inexpensive way to check for 
rectilinearity of an input TAS, as well as a partial order reduction that renders model checking of 
rectilinear TASes to be tractable. Combined, this provides (for an n x n assembly surface) a 0(n 2 ) 
algorithm that either verifies that the input TAS has a unique terminal assembly, or provides an 
execution trace that demonstrates the input system's failure to be locally deterministic and/or 
rectilinear. 



Fig. 1. Example assembly sequence for a rectilinear tile assembly system. In (i) through (iii), the 
TAS builds its edges north and east, then starts building its interior (to the east and north) in (iv). 
By contrast, the configurations built in (v) are not rectilinear, as tiles are placed to the south or 
west of other tiles. 



(iii) 





















SEED 









not rectilinear 








SEED 







(iv) 



(v) 



For model checking to have practical application to nanoscale self-assembly, some state space 
reduction is essential, because even locally deterministic rectilinear TASes, which are tightly con- 
strained, achieve an exponential blowup with respect to the size of the surface. This is shown 
precisely in the following proposition, in which we assume (as is common in the literature) that the 
seed assembly of the input TAS is one tile in size. 

Proposition 1. Let T be a locally deterministic rectilinear tile assembly system with \o~\ = 1, and 
n > 1 an integer. Then the (worst-case) number of legal configurations T can build on an n x n 
surface, if a is -placed at location (0, 0) is given by: 



2(2n-l)! 
n\(n — 1)! 



1. 



A simple example of a TAS with this worst-case behavior is Winfree's seven-tile TAS that produces 
the discrete Sierpinski Triangle through an XOR calculation. 

We defer the proof to the Appendix. The result that permits us to perform practical model checking 
experiments on tile assembly systems is the following. 

Theorem 3. There exists a polynomial-time algorithm A that does the following. Given an input 
TAS T and a surface size n, A either produces a legal assembly sequence of T that demonstrates 
T is not rectilinear, or A correctly asserts that T has a unique terminal assembly, or A produces 
two legal assembly sequences ofT that will result in distinct terminal assemblies. Further, A only 
needs to evaluate 0(n 2 ) configurations of T . 

The proof appears in the Appendix. 

5 Experimental results 

Probably the most-used aTAM simulator is Matthew Patitz's ISU TAS |14j . We wrote a Java 
program that translates ISU TAS tile assembly system files into the language of the SMART model 
checking engines |3j. SMART, the Stochastic Model-checking Analyzer for Reliability and Timing, 
was initially intended as software to describe and analyze complex timed and stochastic models. 
It has developed to include model checking of both stochastic and nondeterministic systems. We 
translate tile assembly systems into nondeterministic Petri nets that SMART can manipulate. Our 
experiments use SMART version 3.1, obtained from Andrew Miner. 

A Petri net is a widely- used structure to model concurrent systems |12j . We will not define Petri 
nets formally here; they can be thought of as directed graphs along which tokens move. We use 
Petri nets extended with transition guards. Tokens are located on vertices, and can move along an 
edge to neighboring vertices if the transition for that edge is enabled by the guard for the transition. 
The state of the system is the snapshot of all current token locations. We translate a tile assembly 
system T acting on an n x n surface into a Petri net with (|7~| + l)n 2 vertices, one for each assertion, 
"tile type t is located at (x,y)," or "location (x,y) is empty." For < x,y < n, there are directed 
edges from the vertex corresponding to "(x, y) is empty" to each state corresponding to u t is located 
at (x, y)" for t a tile type in T ■ In the initial state, tokens are placed to simulate the seed assembly 
of T ■ The transition rules of the Petri net simulate the self-assembly of the tiles: if location (x, y) 
has west neighbor t and south neighbor t! , a nondeterministic transition is enabled for (x,y) iff a 
tile could legally bind to that configuration in the aTAM. A SMART code fragment appears in 
Figure [8] in the Appendix. 

The experiment we ran (on different TASes at different surface sizes) was to verify that the 
input TAS achieved a unique terminal assembly for the input surface size. SMART verified this by 
first building the transition system induced by the Petri net, and then calculating the cardinality 
of the set "all reachable states minus all states with successors." We performed these experiments 
on a 2.13 Ghz Intel Xeon CPU with 48gb RAM running Linux. Our experimental results show 
that, atypically for a model checking problem, the limiting factor is the memory required to store 
the rules of the model, not the memory required to store the state space. (Data supporting this 
conclusion appear in Table ^ in the Appendix.) Further, the size of the model depended heavily on 
the logical complexity of the binding rules, or the Petri net guard commands. Figure [3] shows this: 
a TAS with 333 tile types required less memory to model than a TAS with 128 tile types, because 
the possible bonds induced by its tiles were logically simpler to describe. 



Since the size of the state space is not a major concern, all our experiments use explicit model 
checking (building the entire transition system), instead of symbolic model checking (using bounded 
decision diagrams to represent multiple states in a transition system). Figure p] graphs our results. 
The numerical data appear in Table [T] in the Appendix. 

The Sierpinski Triangle TAS is due to Winfree. Kautz and Lathrop designed the TASes for 
the Sierpinski Carpet and a "numerically self-similar" variant [§]. The Fibered Sierpinski Triangle 
TAS is due to Patitz and Summers |15j . Unlike the other TASes, the Fibered Sierpinski Triangle is 
not rectilinear, and it does not place a tile on every point in the first quadrant. So the number of 
configurations to count for a given surface size n is different, compared to the rest of the TASes. 
This accounts for the different shape of the curve in Figure [2j 

SMART built the transition system by starting with the initial configuration, and then checking 
each location (x, y) to see if it was legal to enable a transition system from "(x, y) is empty" to 
"(x, y) contains t" for each tile type t. Since there are n 2 total configurations (after partial order 
reductions), n 2 locations, and \T\ tile types, this requires 0(|7"| • n 4 ) operations. However, this 
algorithm does not take advantage of special characteristics of tile assembly systems. We believe a 
specialized front end for self-assembly can significantly reduce both memory cost and running time; 
we discuss this in Section [U 



Fig. 2. Experimental results: length of time required to verify unique terminal assembly with respect 
to the size of the (square) assembly surface. Additional experiments on the Sierpinski Carpet 
Variant were not possible, because of memory limitations; similarly, the spike in time required for 
the Sierpinski Carpet was due to nearing the limit of system memory (see Figure pi). 

— ■ — Fibered Sierpinski Triangle (333 tile types) 
— ■ — Sierpinski Carpet Variant (1 28 tile types) 

D Sierpinski Carpet (30 tile types) 
— ■ — Sierpinski Triangle (7 tile types) 
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6 Conclusion and future work 

The primary contribution of this paper was the interpretation of a self-assembly model into CTL, 
in such a way that an important class of tile assembly systems was in-principle tractable to model 
checking. Experimentally, however, we made almost no use of the power of CTL, but were solely 
interested in counting the number of deadlock states of a transition system. Modeling the aTAM, 
in which tiles bind forever in an error-free manner, is strictly simpler than modeling a more realistic 
system (such as Winfree's Kinetic Tile Assembly Model, or kTAM) in which tiles can bind in error, 
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Fig. 3. Experimental results: memory required to define tile binding rules for each location on the 

(square) assembly surface. This does not include memory required for either the state space or the 

reachability graph (see Table [2] in the Appendix) . 
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or fall off of assemblies after initially binding. Therefore, we see four important directions for future 
work. 

Reduce memory cost The transition rules are the same for each location on the surface. Cur- 
rently, SMART does not take advantage of this, instead storing a distinct copy of the rules for 
each location. We could significantly reduce memory cost with a front end specialized to tile 
assembly, so that the size of the state space becomes the primary memory issue (as is the case 
with more "standard" model checking problems). 

Reduce running time Produce a guard and transition manager for SMART specialized to self- 
assembly. Since tiles can only bind at the frontier of the growing assembly, there are at most 
linearly-many locations to check at each stage, instead of having to check all n 2 steps. Further, 
an on-the-fly construction of a hash table of tiles that can legally bind to a given configuration 
of neighbors will eliminate the need to consider all \T\ tile types for each location at each stage. 
This might permit a reduction in running time from C(|7~| • n 4 ) to 0(n 2 + |7~|). 

Verify other classes of TAS There are many tile assembly systems that have strong symmetry 
properties, which, we believe, will induce partial order reductions. We plan to explore making 
the model checking problem tractable for a wide variety of TASes, not just the rectilinear ones. 

Verify 'within more realistic models Use probabilistic CTL to verify stochastic, error-permitting 
models of self-assembly. For example, the SMART CTMC engine could be used to verify 
TASes in the kTAM, much as Winfree's xgrow simulation software simulates kTAM TASes 
as continuous-time Markov chains. 
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A Proof of Theorem [T] 

We now build the tools needed to solve the model checking problem for the aTAM. 
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Definition 2. Let T = (T, a) be a tile assembly system, and let n £ N be large enough that a fits 
completely in the surface n x n. We define Mf^ n , the canonical transition system for T on n x n, 
as follows. 

1. The initial state of M-j-^ is the configuration with a placed on the surface, with the lower-left 
corner of a at (0,0). (Since a finite seed assembly can be encoded into a single seed tile, by 
increasing the cardinality of — T — in a way that does not affect our results in this paper, we 
can assume without loss of generality that there exists a unique, unambiguous way to place a 
on the surface, so it is "rooted" at (0,0) J 

2. The states of Mq-^ n are the tile configurations onnxn that can be achieved by legal tile additions, 
starting with a , according to the binding rules of T. 

3. The transition relation of M-y^ n is defined in the natural way: configuration c can transition to 
configuration c' if there is a legal single-tile addition that transforms c into c' . 

4- Mf. n has associated with it a set of (k + l)n 2 atomic propositions (where \T\ = k), which we 
place in one-one correspondence with triple (m,i,j), where < m < k, and < i,j < n. 

5. The labeling function of M-j^ n maps a state to the subset of atomic propositions that correspond 
to the tiles placed at each location in that state, and to the empty spaces present at that state. 

Not surprisingly, the canonical model generated by T and n is a model for CTL(T, n). 

Lemma 1. M T , n |= CTL(T,n). 

Proof. All states of M-r, n contain the seed tile (or seed assembly), so all states satisfy the conjunction 
requiring the CTL(7~, n)-analogue of the seed to be present. The only way to transition from one 
state to the next is through a legal binding of a tile to an empty location. CTL(T, n) ensures that 
tiles never fall off, that multiple tiles are never placed to the same location, and that the same 
transition rules for a location exist that exist in M-j- t n- So M-j- t n is a model for CTL(T, n). 

More importantly, up to isomorphism, there is only one model for CTL(T, n) of the same size 
as M-r, n - This will allow us to apply model checking algorithms without having to worry about 
finding "counterexamples" that only show the system being checked is modeled incorrectly. (This 
is a common concern in formal verification, as the systems to be verified are often so complex, 
the only way to make the problem tractable is to make a submodel that hopefully captures the 
important aspects of the system.) 

Lemma 2. Let N be a pointed transition system such that N |= CTL(T,n), and such that the 
number of atomic propositions of N is the same as the number of atomic propositions of M-j-, n - 
Then the subset of N that is connected to the initial state is isomorphic to Mf n . 

Proof. Since CTL(7~, n) requires that every state of any model must satisfy ip a ,n, the initial state 
sn of iV must satisfy it. Satisfaction of ipcr,n induces a bijection between the atomic propositions 
appearing in the labeling function of iV and the atomic propositions of CTL(7~, n). If we consider the 
connected component of N, rooted at sjy, states can only be connected by the transition relation of 
iV if they adhere to the transition rules permitted by CTL(T, n). By tracing the bijection from the 
atomic propositions of TV to those of CTL(7~, n) to those of M-r, n , we can construct an isomorphism 
as in the statement of the lemma. 

Given these two lemmas, we can prove our main result: the question of unique terminal assembly 
is amenable to model checking. 
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Theorem 1. There exists an efficient procedure to interpret a tile assembly system T and a surface 
n x n within the temporal logic CTL(7~,n). In particular, the question "Does T have a unique 
terminal assembly that fits in the n x n surface 1 ?" can be reduced to the problem of model checking 
on CTL(T,n). 

Proof. With CTL(7~, n) defined as above, we can express the notion of a terminal assembly as 



A 



t% -> AG($). 



i,j€{0,...,n-l} 

The formula <f> is only true in states (configurations) to which no more tiles can be legally added 
(and it is true in all such states). 

Now, suppose we want to determine if tile assembly system T has unique terminal assembly S, 
where S is a finite structure. First, let us assume we know S is terminal. Then we choose n so it is 
minimal such that 5 is contained in the surface n x n, we plug CTL(7~, n) into a model checker, and 
we ask whether hF(ips,n) is true. A counterexample would indicate an additional terminal assembly. 

Second, let us suppose that we do not know a priori whether S is terminal. (Perhaps S is large, 
and was not analyzed exhaustively.) We choose n as before, and first ask whether S is a state that 
satisfies <j> in CTL(7~, n). If S is not terminal, we stop. Otherwise, we proceed as above, and solve 
the model checking problem for AF(ips,n)- The process allows for formal verification that S is the 
unique terminal assembly of T ■ 

Finally, suppose we do not start with a candidate S, but simply wish to know whether some 
finite unique terminal assembly exists. This question is NP-complete in general [2], but as long 
as we can approximate the size of a terminal assembly within a log- factor, we can answer it and 
still remain polynomially close to the running time of the first surface we consider. In specific, we 
choose an n, and find the set of terminal assemblies for T on that n x n surface. By examining 
the perimeter of that assembly, we know with 4n — 4 comparisons whether the assembly reached is 
in fact a terminal assembly even on a larger surface. (If there are unattached strength-two bonds 
on the corners of the surface, then the assembly can continue to grow; otherwise, it cannot.) We 
then increase the size of the surface by up to a log-factor, as that will only increase the maximum 
worst-case number of legal tile configurations by a polynomial factor. Assuming our initial estimate 
of n was sufficiently close, we will find the terminal assembly for the structure, in time polynomial 
in the number of configurations, using standard CTL model-checking algorithms. 

B Proof of Proposition [l] 

Proposition 1. Let T be a locally deterministic rectilinear tile assembly system with \o~\ = 1, and 
n > 1 an integer. Then the (worst-case) number of legal configurations T can build on an n x n 
surface, if a is placed at location (0, 0) is given by: 

2(2n-l)! 

n\(n — 1)! 

A simple example of a TAS with this worst-case behavior is Winfree's seven-tile TAS that produces 
the discrete Sierpinski Triangle through an XOR calculation. 
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Proof. The worst-case situation is one in which the possible configurations fill the entire n x n 
surface, so we will assume T has this property. The seven-tile Sierpinski Triangle TAS has this 
property. We define a directed graph, and decorate each node with a number of tile configurations 
reachable from the seed assembly of T, as follows. 

Start by creating a root node, and decorate it with 1. (This represents the seed tile placed at 
(0, 0).) Attach two n — 1 length chains to the root, and decorate each node in each chain with a 1. 
(This represents the choices of just adding a tile to the north along the west side of the assembly, 
or of just adding a tile to the east along the south side of the assembly.) 

Now repeat the following until all configurations are exhausted: 

For each pair of nodes (a, b) equidistant from the root, create node c, and create edges a — >■ c 
and b — > c. Let A be the set of configurations counted at A, and let B be the set of configurations 
counted at B. For each pair of configurations a £ A and j3 G B, let 7 = dom a U dom j5 be counted 
at c. Let r be the set of such 7. Further let i~" = {7' | 7' is achievable from some 7 counted at c, 
without adding any northern tiles to the west side of 7, or adding any eastern tiles to the south 
side of 7}. Let T U i~" be the set of configurations counted at c, and decorate c with \r U i~"|. 

Note that each configuration is counted exactly once, due to the rectilinearity and local deter- 
minism of T: it is enough to choose how many tiles to place at the west or south edges to uniquely 
determine the configurations that must appear as a result. Also note that this graph with decora- 
tions, as n grows, is exactly an increasingly bigger diamond-shaped part of Pascal's Triangle, and 
the total number of possible configurations is the sum of the decorations of all nodes of the diamond. 
(Figures |4] and [5] show the cases where n = 2 and n = 3, which may provide some intuition.) That 
is a known counting problem, and already part of the Online Encyclopedia of Integer Sequences. 
(See Ralf Stephan's comment in [I].) We obtain the formula from this resource. 



C Proof of Theorem 3 

Theorem 3. There exists a polynomial-time algorithm A that does the following. Given an input 
TAS T and a surface size n, A either produces a legal assembly sequence of T that demonstrates 
T is not rectilinear, or A correctly asserts that T has a unique terminal assembly, or A produces 
two legal assembly sequences of T that will result in distinct terminal assemblies. Further, A only 
needs to evaluate 0(n 2 ) configurations of T ■ 

Proof. Since the question, "Does T have a unique terminal assembly?" reduces to a model checking 
problem, and since efficient model checking algorithms return either "yes" or an execution trace 
that functions as a counterexample, our task simplifies to producing (1) a test for rectilinearity, 
and (2) a partial order reduction so the search space for the algorithm A is only 0(n 2 ). The 
test for rectilinearity is straightforward: each time we place a tile to build a larger configuration, 
we check whether the tile has any unattached strength-two bonds. If a tile has an unattached 
strength-two bond to the north, it must be at the western edge of the configuration. Similarly, if 
a tile has an unattached strength- two bond to the west, it must be at the southern edge of the 
configuration. Otherwise, if we encounter an unattached strength-two bond in any other situation, 
it is a counterexample to the rectilinearity of the input tileset. 

The partial order reductions rely on the following symmetries: if a tile binds at location (x,y), 
its binding is independent of any tiles at (i,j) for i < x and j > y, and independent of any tiles at 
(k, I) for k > x and I < y. This allows us to limit our search of the configuration space as shown 
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1 configuration 
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Fig. 4. Counting the number of unique configurations for a locally deterministic, rectilinear tile 
assembly system with seed at the origin on a 2 x 2 surface; the enumeration is isomorphic to the 
minimal Pascal triangle diamond. 



in Figure [6J If the input TAS is both rectilinear and uniquely determined, there are four distinct 
configurations in each of the three-location shaded regions of Figure [6} For the locations marked 
with a "1", we can simply determine a priority order, for example, placing all tiles in the x = 
column before moving to the x = 1 column. Hence, for an n x n surface, we can calculate the 
number of configurations we need to search as 



#(configs in shaded regions) + ^(locations marked with "1") = [4(n — 1)] + 



n-2 



2 E* + 1 



i=l 



= [4(n-l)] + [(n-l)(n-2) + l] 

= (n-l)(n + 2) + l 
= n +71 — 1. 

If we find at any location that more than one tile can be placed there, we have a counterexample 
to uniqueness of terminal assembly. Otherwise, we need only check 0(n 2 ) configurations to verify 
that the input tileset is both rectilinear and has a unique terminal assembly. 

D Translation of ISU TAS files into SMART language 

The ISU TAS simulator uses two files to define a tile assembly system: a file that describes each 
tile type in a text format (see Figure [7|) , and a file that defines the seed assembly of the TAS by 
listing filenames and locations. We translate this TAS, and its implicitly defined behavior, into the 
SMART language by declaring a Petri net as shown in Figure [8| 
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1 configuration 
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1 configuration 




Fig. 5. Counting the number of unique configurations for a locally deterministic, rectilinear tile 
assembly system with seed at the origin on a 3 x 3 surface; the enumeration is isomorphic to the 
second-smallest Pascal triangle diamond. 
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Fig. 6. Search space schematic for formal verification of a rectilinear tile assembly system. The 
locations marked with "1" only need to be checked one time (does a unique tile bind here, or can 
multiple tiles bind here?), because the behavior at those locations is completely determined by the 
tiles that have already been placed to the south and west. In the three- location shaded regions, 
however, all legal configurations need to be tested, to confirm there is no violation of rectilinearity 
(e.g., two tiles causing a third to bind to the south). 
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Fig. 7. A tile type in ISU TAS file format for the Sierpinski Triangle TAS. It provides a name for 
each tile, and tile edge; and sets the bond strength of each tile edge. 
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pn SierpTri := { 

// the locations of the Petri net correspond to the presence (or absence) of a tile from a specific location 
// first the possibility that locations are empty 
for (int i in {0. .49}) { 
for (int j in {0. .49}) { 
place empty[i][j]; 
}} 

// now the possibility that locations have tiles 
for (int k in {0. .6}) { 
for (int i in {0. .49}) { 
for (int j in {0. .49}) { 
place tile[k] [i] [j] ; 
}}} 

// the transitions of the Petri net correspond to all potential bonds that may be formed 
for (int k in {0. .6}) { 
for (int i in {0 . .49}) { 
for (int j in {0. .49}) { 
trans bond[k] [i] [j] ; 
}}} 

// initialization command translating the tiles of the seed assembly 

// to an initial configuration of tokens in the Petri net 

init(tile[0] [0] [0] :1) ; 

init (empty [0] [1] : 1, empty [0] [2] : 1 , ...II continues for all 50 x 50 locations 

// this section produces the arcs/transitions for the Petri net 

// first produce (unguarded) transitions from empty location (x,y) to each possible tile at (x,y) 
for (int k in {0. .6}) { 
for (int i in {0. .49}) { 
for (int j in {0. .49}) { 

arcs(empty[i] [j] :bond[k] [i] [j] , bond[k] [i] [j] :tile[k] [i] [j]); 
}}} 

// now produce guards that activate the bond transition only if the binding rule is true 
// first a loop that takes care of all non-boundary conditions 
for (int i in {1. .48}) { 
for (int j in {1. .48}) { 

guard (bond [0] [i] [j] : (tk(tile [5] [i] [j+1]) > 0) I (tk(tile [6] [i+1] [j] ) > 0)); 
// continues for all guards at all locations 

// the following commands generate statesets and related expressions for use by the model checking program 

bigint numStates := card (reachable) ; 

stateset nonTerminalStates := EX (potential (true) ) ; 

stateset terminalStates := reachable \ nonTerminalStates; 

bigint numTerminalStates := card(terminalStates) ; 

}; 

// this is the model checking program based on the Petri net defined above 

print ("Number of reachable states for this tile assembly system: ", SierpTri .numStates) ; 

print ("Number of terminal assemblies reachable from the seed assembly: ", SierpTri .numTerminalStates) ; 

Fig. 8. Code fragment that defines a Petri net in the SMART language to verify the unique terminal 
assembly of the Sierpinski Triangle TAS on a 50 x 50 surface. The TAS has seven tile types (which k 
ranges over in the loops). empty[i] [j] is the boolean "No tile is at (i,j), n while tile[k] [i] [j] is 
the boolean "Tile type k is at (i,j)" If the transition bond[k] [i] [j] is enabled, it is legal for tile 
type k to bind at location (i,j). The guard commands determine whether to enable the transitions. 



Table 1. Experimental results: verification times. 



Name of TAS 



No. of Tile Types 



Surface Size 



Verification Time 



Sierpinski Triangle 



7 



50 x 50 
75 x 75 
125 x 125 
150 x 150 
175 x 175 
200 x 200 
250 x 250 
316 x 316 



18 seconds 
91 seconds 
677 sec (11.3 min 
1316 sec (21.9 min 
2419 sec (40.3 min 
4079 sec (68.0 min 
9542 sec (2.7 hrs 
> 6 hrs 



Sierpinski Carpet 



30 



20 x 20 

32 x 32 

50 x 50 

60 x 60 

75 x 75 

90 x 90 

100 x 100 

110 x 110 

115 x 115 

120 x 120 

125 x 125 

150 x 150 



7 seconds 

29 seconds 

125 seconds 

237 sec (3.9 min) 

548 sec (9.1 min) 

1001 sec (16.7 min) 

1463 sec (24.4 min) 

2034 sec (33.9 min) 

2608 sec (43.5 min) 

2961 sec (49.3 min) 

3345 sec (58.7 min) 

13637 sec (3.8 hrs) 



Sierpinski Carpet Variant 



128 



10 x 10 
20 x 20 
32 x 32 
40 x 40 
50 x 50 
55 x 55 
60 x 60 



9 seconds 

59 seconds 

264 sec (4.4 min) 

513 sec (8.6 min) 

1005 sec (16.7 min) 

1413 sec (23.6 min) 

> 3 hrs 



Fibered Sierpinski Triangle 



333 



10 x 10 
15 x 15 
20 x 20 
25 x 25 
30 x 30 
35 x 35 
40 x 40 



7 seconds 

53 seconds 

230 sec (3.8 min) 

433 sec (7.2 min) 

668 sec (11.1 min) 

2051 sec (34.2 min) 

2912 sec (48.5 min) 
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Table 2. Experimental results: memory used to define binding rules, the transition system state 
space, and the reachability graph (edges of the transition system graph). 



Name of TAS Tile Types 


Surface Size 


Memory to define rules 


State space storage 


Reachability graph storage 


Sierpinski Triangle 7 


50 x 50 


0.25gb 


5.6mb 


39kb 




75 x 75 


0.57gb 


28.3mb 


88kb 




125 x 125 


1.6gb 


218mb 


244kb 




150 x 150 


2.3gb 


452mb 


528kb 




175 x 175 


3.1gb 


838mb 


478kb 




200 x 200 


4.0gb 


1.4gb 


625kb 




250 x 250 


6.2gb 


3.4gb 


976kb 




316 x 316 


9.9gb 


? 


? 


Sierpinski Carpet 30 


20 x 20 


0.51gb 


303kb 


6kb 




32 x 32 


1.3gb 


2mb 


16kb 




75 x 75 


7.5gb 


89mb 


88kb 




100 x 100 


13.2gb 


283mb 


156kb 




110 x 110 


16.0gb 


415mb 


189kb 




120 x 120 


19.1gb 


588mb 


225kb 




150 x 150 


29.9gb 


1.4gb 


352kb 




200 x 200 


>40gb 


? 


? 


Sierpinski Carpet Variant 128 


10 x 10 


1.2gb 


19kb 


2kb 




20 x 20 


5.2gb 


303kb 


6kb 




32 x 32 


13.6gb 


2.9mb 


16kb 




40 x 40 


21.6gb 


7.1mb 


25kb 




50 x 50 


34.1g 


17.6mb 


39kb 




55 x 55 


40.7gb 


25.8mb 


47kb 


Fibered Sierpinski Triangle 333 


10 x 10 


0.74gb 


lkb 


112 bytes 




15 x 15 


1.7gb 


22kb 


656 bytes 




20 x 20 


3-lgb 


146 kb 


2kb 




25 x 25 


5.0gb 


330kb 


3kb 




30 x 30 


7.3gb 


575kb 


3kb 




35 x 35 


9.8gb 


1.7mb 


7kb 




40 x40 


12.9gb 


2.6mb 


9kb 



20 



